Across the Bay Area, air pollution takes a huge toll on the health of communities. Using purple air air data from sensors, we found that the effects of air pollution vary widely between and within communities, so we study which areas face particularly high burdens and plan new sensors in high-burden areas.

The report takes two cities (Menlo Park and Redwood City) as examples and analyzes their geographic equity by mapping the Air Quality Index(AQI) of each block group and plotting the average PM 2.5 in February. The population equity is conducted by comparing the PM 2.5 distribution across different income groups and different racial groups. Finally, in data equity part, the report gives some suggestions to the suppliers of PM 2.5 probes by proposing two sets of metrics for selecting which block groups have greater demand to PM 2.5 probes.

Raw senors data is extracted from PurpleAir and then is converted into general PM2.5 and AQI. There are 1055 sensors in San Mateo County in total. The following mapping shows the relative AQI of each sensor in the county. From the mapping we can see the AQI in the place near East Palo Alto and Redwood city is not as good as other places. Actually, the absolute AQI of most block groups is Good in the county.

Jurisdiction Equity

To compare differences in air quality, we pulled sensors from two cities (Menlo Park and Redwood City), and use the voronoi technique to transform point-estimates of outdoor air quality to census block groups.

Please check this link to see the option version of these maps: https://yaojinghuanghe.shinyapps.io/dashboard_pm25/ ## mlp

census map

The following mapping shows the result after voronoi interpolation at the block groups level. From the mapping we can see that the places near the bay tends to have higher PM2.5.

time map

The following chart shows the outdoor PM2.5 level in February 2022 in the Menlo Park. The PM2.5 in Menlo Park City fluctuates in this month.

rwc

census map

The following mapping shows the result after voronoi interpolation at the block groups level. From the mapping we can see that compared to Menlo Park, the city’s air quality seems a little worse than Menlo Park just according to the PM2.5 level. Besides, the PM 2.5 level is higher in the east and also in the center than other places.

time map

The following chart shows the outdoor PM2.5 level in February 2022 in the Redwood City. Similarly, the PM2.5 in the city also fluctuates in this month. Generally, we find that the PM2.5 tends to be relatively high in weekend and low in weekday for these two cities, which makes sense because more people tend to go out in weekends.

Population equity

We then turned to indoor air pollution, where the structure of the house or the air purification facility still determines each household’s exposure to indoor air pollution, given the same level of outdoor pollution.

By filtering purple air probes to indoor probes, we looked at whether there are income or racial differences in indoor air pollution in San Mateo County.

Please check this link to see the option version of these maps: https://yaojinghuanghe.shinyapps.io/dashboard_pm25_equity/

Income Analysis

We collect the income data in San Mateo County using ACS 5-years dataset (2019) at the block groups and divide income into four levels: Less than $24,999(low), $25,000 to $44,999(median low), $45,000 to $99,999(median high), $100,000 or more(high). We split the PM 2.5 into 5 levels.

From the following equity analysis figure we can see that PM 2.5 exposure degree is unequal among different income groups. High income groups are less exposed to bad air quality (in terms of PM 2.5), while low income and median low income groups are more exposed to bad air quality, low-income people are more vulnerable to indoor air pollution.

Race Analysis

We collect the census race data in San Mateo County using decennial data (2010-2020) at the block level and divide races into six categories: American Indian and Alaska Native alone, Asian alone, Black or African American alone, Native Hawaiian and Other Pacific Islander aloneTwo or more races and White alone. From the following equity analysis figure we can see that PM 2.5 exposure degree is obviously unequal among different races than that among different income groups. Whites are less exposed to poor air quality based on their population percentage.

Data equity

The population equity analysis above is based on the assumption that our PM2.5 data is collected equally or evenly among different groups. But in reality, it will never happen because of some reasons. For example, suppliers may not be willing to install in places with relatively small population or relatively backward economic level, because it will not produce a lot of economic benefits. But we still need it. We still need to make the data collection as equal as possible since that is the promise of any further analysis. So, We try to design a score/scores for the County which should communicate the degree to which information on the air quality of different population groups is disproportionately available, due to the availability of sensors. In this section, we propose a set of score metric at the block group level which can shows the neediness of different races, income groups and areas(since there is still no sensor in some block groups). Two different quantitative models are presented when the neediness scores of every jurisdiction’s score is calculated. The main idea of our method is if a place already has more sensors than they ‘should’ have (in terms of races, income groups and area coverage), the neediness score of the group of area should be low.

Methodology

We want to collect data among different races equally. For example, assume the population of a certain race is \(p\) in the county and the population of this race who are in the monitoring area (percent with data) is \(p_s\). Ideally, \(p_w=p/p_s\) should be same among different races. But it will never happen as mentioned above. We assign the high \(p_w\) race with low score and low \(p_w\) race with high score as a way to balance them. We can use similar principle to achieve collect data among different income groups equally and collect data among different area equally.

race: By studying the percentage of people of the same race covered by air detection probes, we found that white people had the highest coverage and some other race alone had the lowest coverage. Based on this, we give each race a score. According to the definition, the higher the score, the greater the degree of demand. We give the highest score to other races, and the rest decrease exponentially, with white races accounting for half of the score. Then we observe the population composition of each census block group, and use the above formulated score to obtain the weighted average score of each census block group according to race. Finally, we rank according to the score and give out race scores.

income: Similar to investigating ethnic differences, we investigated whether there were income differences in the distribution of air quality probes. We found that sensors were least distributed among lowest-income group people and were most distributed among the people with the highest income. We therefore rated air quality probe exposure for each income group and calculated a weighted average for each census block group.

cover area: Next, we give scores according to the coverage area of air quality inspection sensors. Since the coverage rate of many regions is as high as 100%, we regard them as the first place in parallel. According to the ranking, the higher the coverage, the lower the score, which proves that they have received enough coverage.

Please check this link to see the option version of these maps: https://yaojinghuanghe.shinyapps.io/dashboard_data_equity_score_perc/

Taking the detection point of each outside pure air as the center of the circle, draw a series of circular areas with a radius of 1 / 8 mile. We believe that the air quality within the distance of 1 / 8 mile can be represented by one air detection point. Therefore, the drawn figure is the area covered by all air monitoring points in San Mateo county.

Then we look into all census block groups of San Mateo to study the demand degree of each census block for additional monitoring sensors, and design a scoring rules to give out score. The higher the score, the more vulnerable the area is and the more monitoring sensors are needed. We first study whether there are racial differences in the distribution of air detection probes.

race

Race Coverage
race pop_withdata pop perc_withdata
American Indian and Alaska Native alone 581.1759 6812 0.0853165
Asian alone 27172.3859 230242 0.1180166
Black or African American alone 1534.0526 15707 0.0976668
Native Hawaiian and Other Pacific Islander alone 833.6145 9302 0.0896167
Some Other Race alone 8777.6957 107924 0.0813322
Two or more races 13179.6663 94267 0.1398121
White alone 55460.6632 300188 0.1847531
Cover Area Coverage (first 5 rows)
cbg perc_area
060816001001 0.0290757
060816001002 0.2605928
060816001003 0.8335262
060816003002 0.0413046
060816004011 0.1475284
Income Coverage
income pop_withdata pop perc_withdata
$100,000 or more 30712.627 154403 0.1989121
$25,000 to $44,999 4061.908 23512 0.1727589
$45,000 to $99,999 10562.358 61629 0.1713862
Less than $24,999 4079.913 23999 0.1700035

Based on the principle, we have two quantitative methods for assigning the neediness scores.

Percent Method

This method calculates the score based on the comparison between the percent with data of each member. For example, in race coverage, we compare the different races. Specifically, We map the percent with data into range (0,1) or we standardize the percent with data as score. Besides, the score should be low (like penalty) for high percent with data. So, we use the following equation to assign scores.

\[score=1-\frac{p_w-min(p_w)}{max(p_w)-min(p_w)}\]

The following table shows the scores we get with this method.

Race Coverage
race pop_withdata pop perc_withdata score
White alone 55460.6632 300188 0.1847531 0.0000000
Two or more races 13179.6663 94267 0.1398121 0.4345447
Asian alone 27172.3859 230242 0.1180166 0.6452899
Black or African American alone 1534.0526 15707 0.0976668 0.8420569
Native Hawaiian and Other Pacific Islander alone 833.6145 9302 0.0896167 0.9198953
American Indian and Alaska Native alone 581.1759 6812 0.0853165 0.9614749
Some Other Race alone 8777.6957 107924 0.0813322 1.0000000
Income Coverage
income pop_withdata pop perc_withdata score
$100,000 or more 30712.627 154403 0.1989121 0.0000000
$25,000 to $44,999 4061.908 23512 0.1727589 0.9046840
$45,000 to $99,999 10562.358 61629 0.1713862 0.9521699
Less than $24,999 4079.913 23999 0.1700035 1.0000000
Cover Area Coverage (first 5 rows)
cbg perc_area score
060816001001 0.0290757 0.9661246
060816001002 0.2605928 0.6963898
060816001003 0.8335262 0.0288795
060816003002 0.0413046 0.9518771
060816004011 0.1475284 0.8281183

After we get the score for each item, we can calculate the final neediness score for each block group. For example, when we calculate the race score for a specific block group, we just need to multiply the score of each race with the population of each race in this block group and sum them up and finally divide the sum by total population in the block group (weighted average). Finally, we get the follow mapping. There are three scores for each block groups.

We can select the block groups that very need more sensors based on different scores. For example, if we just want to make the data collection among different races become more equal, we can select the places with high scores in Race Score layer, such as some block groups near East Palo Alto. Or one can weight these scores according to their concerns and get a new score.

Rank Method

Another method is rank method with exponential decay (as follow), which we have used before. \[score=e^{-\lambda Rank(p_w)}\] Take race score as an example, We give the highest score (1) to the race whose \(p_w\) is minimum (in our case, Some Other Race alone), and the rest decrease exponentially, with the race whose \(p_w\) is maximum (in our case, white) accounting for half of the score. Next, we give scores according to the coverage area of air quality inspection sensors. Since the coverage rate of many regions is as high as 100%, we regard them as the first place in parallel. According to the ranking, the higher the coverage, the lower the score, which proves that they have received enough coverage. Similar to investigating ethnic differences, we investigated whether there were income differences in the distribution of air quality probes. We found that probes were least distributed among middle-income people and were most distributed among the people with the highest income. We therefore rated air quality probe exposure for each income group and calculated a weighted average for each census block group.

The following table shows the scores we get with this method.

Race Coverage
race pop_withdata pop perc_withdata rank score
Some Other Race alone 8777.6957 107924 0.0813322 1 1.0000000
American Indian and Alaska Native alone 581.1759 6812 0.0853165 2 0.9057237
Native Hawaiian and Other Pacific Islander alone 833.6145 9302 0.0896167 3 0.8203354
Black or African American alone 1534.0526 15707 0.0976668 4 0.7429971
Asian alone 27172.3859 230242 0.1180166 5 0.6729501
Two or more races 13179.6663 94267 0.1398121 6 0.6095068
White alone 55460.6632 300188 0.1847531 7 0.5520448
Income Coverage
income pop_withdata pop perc_withdata rank score
Less than $24,999 4079.913 23999 0.1700035 1 1.0000000
$45,000 to $99,999 10562.358 61629 0.1713862 2 0.8408964
$25,000 to $44,999 4061.908 23512 0.1727589 3 0.7071068
$100,000 or more 30712.627 154403 0.1989121 4 0.5946036
Cover Area Coverage (first 5 rows)
cbg score_cover_area
060816001001 0.8184677
060816001002 0.6284733
060816001003 0.5017759
060816003002 0.8083739
060816004011 0.7039799

Similarly, we can also get a score mapping using this method. After comparison we can find that the main results of these two score methods are similar. Physically, the rank method more intuitive for race score and income score. However, percent method seems more sensitive for cover area score. This might because there is no sensor monitoring area in many block groups (too many rank = 1, next rank might be 40 rather rather 2 or 3) . These places’ scores should be high but should not be too far away from the places with a little sensor monitoring coverage.